Calibrating a multi-species model to time-averaged species’ catches

In this example we will explore how we can learn about models by fitting size spectrum ecological models to data using the “mizer” R package.

Recall, there are three different kinds of size spectrum models in mizer, of increasing complexity:

  1. community model: purely size-based and representative of a single but “average” species across the whole community.

  2. trait-based model, which disaggregates the size spectrum into different groups with different life-histories, through differences in each “species” asymptotic which determines other life-history parameters such as the size at maturity (Hartvig et al. 2011, Andersen & Pedersen, 2010).

  3. multispecies model - which has the same equations and parameters as the trait-based model but is parameterised to represent multiple species in a real system, where each species can have many differing species-specific traits (Blanchard et al. 2014).

Here we focus on multispecies size spectrum models. In practice, these models have been parametrised in a few different ways depending on data availability for a system or research questions.

Some studies have focused on many species-specific values, for example where each species have different values of life-history, size-selective feeding trait parameters (e.g. \(\beta\) and \(\sigma\)), and details of species interactions (Blanchard et al. 2014, Reum et al. 208) to better capture the dynamics of marine food webs.

Others, such as Jacobsen et al. (2014, 2016), have represented variation in only a couple of the most important life history parameters for each species - asymptotic size (which links to other parameters such as maturation size) and recruitment parameters (\(R_{max}\), \(eRepro\)) to broadly capture fished communities or carry out across ecosystem comparisons.

Once you have parametrised the multispecies model for your system (section 1), you may find that species do not coexist or the biomass or catches are very different from your observations. After the model is parameterised and assessed for basic principles and coexistence (section 3), further calibration to observational data is used to ensure the relative abundance of each species reflects the system (section 4), at least during a stable period, which is time-averaged.

The background resource parameters and the recruitment parameters, \(R{max}\) (maximum recruitment) and \(erepro\) (reproductive efficiency) greatly affect the biomasses of species in your system.

The recruitment parameters are highly uncertain and capture density dependent processes in the model that limit the number of offspring that successfully recruit to the smallest size class for each species. In the default mizer package these parameters are used to implement an emergent Beverton-Holt type stock recruitment relationship.

As a starting point, we will estimate these parameters as a means of fitting the modelled species catches to the observed catches. This could similarly be carried out with biomasses. Other model detailed approaches also exist, see the main paper, but this approach has been used to get models in the right “ball-park”, which can them be further evaluated using diagnostics (example X) and fitted to time series data (example XX).

A Simple Protocol for Multispecies Model Calibration

We will adapt the “recipe” for calibration in Jacobsen et al 2014 (see supp. mat.) and Blanchard et al (2014), into the following steps:

  1. Run the model with the chosen species-specific parameters. This will relate some of the missing parameters to \(w_{inf}\) (\(h\) and \(\gamma\) - explain through simple example of how the model works?). \(R_{max}\) (see example that explains \(R_{max}\)?) could also be automatically calculated based on equilbrium assumptions (Andersen et al. 2016) but by default it is “\(Inf\)”, which means there is no density dependence associated with spawner-recruit dynamics (RF: default is the 2016 method at the moment but setting up at \(Inf\) to start with something that doesn’t coexist below).

  2. Obtain the time-averaged data (e.g. catches or biomasses for each species) and the time-averaged fishing mortalty inputs (e.g. from stock assessments). Typically this should be over a stable part of the time series for your system.

  3. Start with the chosen parameters for \(\kappa\) and \(\lambda\) of the resource spectrum that are obtained from the literature regarding the community size spectrum. These can be very uncertain and sometimes are not available. Calibrate the carrying capacity of the background resource spectrum, \(\kappa\), by examining the feeding level, biomass through time, and overall size spectrum.

  4. Calibrate the maximum recruitment, \(R_{max}\), which will affect the relative biomass of each species (and, combined with the fishing parameters, the catches) by minimising the error between observed and estimated catches (again or biomasses).

  5. Check that the physiological recruitment, \(RDI\), is much higher than the realised recruitment, \(RDD\). This can be done using the getRDD() and getRDI() functions and calculating the ratio which should be around 100 for a species with \(w_{inf} = 1500g\) (e.g. Whiting/Plaice), but varies with asymptotic size and fishing mortality (Andersen 2019). High \(RDI/RDD\) ratio indicates the carrying capacity is controlling the population rather than predation or competition. Larger species often require more of this density dependent control than smaller ones. If \(RDI/RDD\) is too high, the efficiency of reproduction (\(erepro\)) can be lowered to ensure species do not outcompete others or or over-resilient to fishing. Lowering \(erepro\) biologically means higher egg mortality rate or wasteful energy invested into gonads. If \(RDI/RDD = 1\) the species is in the linear part of the stock recruitment relationship (no spawner-recruit density dependence).

  6. Verify the model after the above step by comparing the model with: species biomass or abundance distributions, feeding level, natural mortality, growth, vulnerablity to fishing (fmsy) and catch, diet composition. Many handy functions for plotting these are available here: https://sizespectrum.org/mizer/reference/index.html

  7. The final verification step is to force the model with time-varying fishing mortality to assess whether changes in time series in biomassess and catches capture observed trends. The model will not cpature all of the fluctuations from environmental processes (unless some of these are included), but should match the magnitude and general trend in the data. (RF: this is going to be Example2)

Step 0. Run the model with the chosen species-specific parameters.

In this section you will:

  • obtain or create a dataframe of species-specific parameters

  • run the dataframe through the mizer package and examine the model output

A species-specific dataframe is already stored in mizer, which contains the North Sea Model Parameters (RF however is probably not updated so using the .csv in the repository)

# if user has not installed the requird packages
# install.packages("tidyverse")
# install.packages("plotly")
#devtools::install_github("sizespectrum/mizer")
#devtools::install_github("sizespectrum/mizerExperimental")

library(mizerExperimental) # for projectToSteady()
library(mizer)
library(tidyverse)
library(plotly)

# loading North Sea data
nsParams <- read.csv("data/nsparams.csv")[,-1]

# This data frame already has Rmax values, let's remove them to calibrate them again later
nsParams[,"r_max"] <- Inf

# If you want to make it less multi-species and more trait-based model
# nsParams[,"beta"] <-100
# nsParams[,"sigma"] <-1.5
params_uncalibrated <- newMultispeciesParams(nsParams, inter, kappa = 1e11, max_w=1e6) # inter comes with loading "mizer"

# note the volume of this model is set to the reflect the entire volume of the North Sea - hence the very large kappa value. This is system specific and you may wnat to work with per m^3 as in the defaults.

#  Add other params for info
#  param$Volumecubicmetres=5.5e13    #unit of volume. Here total volume of North sea is used (Andersen & Ursin 1977)

# have a look at species parameters that have been calculated
# params_uncalibrated@species_params

# alternative params without redundant parameters to reduce the size of the dataframe on the screeen
params_uncalibrated@species_params[,-which(colnames(params_uncalibrated@species_params) %in% 
                                             c("sel_func","gear","interaction_resource","pred_kernel_type","m","alpha","n","p","q","w_min"))]
##         X1 species   w_inf w_mat   beta sigma R_max  k_vb  l25  l50     a     b
## Sprat    1   Sprat    33.0    13  51076   0.8   Inf 0.681  7.6  8.1 0.007 3.014
## Sandeel  2 Sandeel    36.0     4 398849   1.9   Inf 1.000  9.8 11.8 0.001 3.320
## N.pout   3  N.pout   100.0    23     22   1.5   Inf 0.849  8.7 12.2 0.009 2.941
## Herring  4 Herring   334.0    99 280540   3.2   Inf 0.606 10.1 20.8 0.002 3.429
## Dab      5     Dab   324.0    21    191   1.9   Inf 0.536 11.5 17.0 0.010 2.986
## Whiting  6 Whiting  1192.0    75     22   1.5   Inf 0.323 19.8 29.0 0.006 3.080
## Sole     7    Sole   866.0    78    381   1.9   Inf 0.284 16.4 25.8 0.008 3.019
## Gurnard  8 Gurnard   668.0    39    283   1.8   Inf 0.266 19.8 29.0 0.004 3.198
## Plaice   9  Plaice  2976.0   105    113   1.6   Inf 0.122 11.5 17.0 0.007 3.101
## Haddock 10 Haddock  4316.5   165    558   2.1   Inf 0.271 19.1 24.3 0.005 3.160
## Cod     11     Cod 39851.3  1606     66   1.3   Inf 0.216 13.2 22.9 0.005 3.173
## Saithe  12  Saithe 39658.6  1076     40   1.1   Inf 0.175 35.3 43.6 0.007 3.075
##         catchability        h k       ks         z0        gamma     w_mat25
## Sprat     1.29533333 14.46675 0 1.593753 0.18705957 5.652974e-11   11.647460
## Sandeel   0.06510547 25.62741 0 2.936414 0.18171206 3.790575e-11    3.583834
## N.pout    0.31380000 31.20422 0 3.372902 0.12926608 9.750228e-11   20.607045
## Herring   0.18150000 28.36363 0 2.920263 0.08647736 2.514308e-11   88.699888
## Dab       0.97800000 34.87720 0 3.781368 0.08735805 7.579184e-11   18.815128
## Whiting   0.24266667 31.77220 0 3.301616 0.05658819 9.927702e-11   67.196884
## Sole      0.37383333 24.73805 0 2.567302 0.06294752 5.184323e-11   69.884760
## Gurnard   0.46250569 20.64990 0 2.193126 0.06863713 4.638552e-11   34.942380
## Plaice    0.18483333 16.94072 0 1.740765 0.04171321 4.489177e-11   94.075638
## Haddock   0.30150000 41.46028 0 4.196598 0.03685027 7.707429e-11  147.833146
## Cod       0.26666749 69.40226 0 6.511735 0.01756590 2.325827e-10 1438.909287
## Saithe    0.39300000 59.95343 0 5.700788 0.01759431 2.435635e-10  964.051303
##         erepro
## Sprat        1
## Sandeel      1
## N.pout       1
## Herring      1
## Dab          1
## Whiting      1
## Sole         1
## Gurnard      1
## Plaice       1
## Haddock      1
## Cod          1
## Saithe       1
# lets' change the plotting colours
# looks good but hard to distinguish some species
# library(viridis)
# params_uncalibrated@linecolour[1:12] <-plasma(dim(params_uncalibrated@species_params)[1])

# easier to read plots
library(pals)
params_uncalibrated@linecolour[1:12] <-glasbey(dim(params_uncalibrated@species_params)[1])

params_uncalibrated@linecolour["Resource"] <-"seagreen3"

# run with fishing
sim_uncalibrated <- project(params_uncalibrated, t_max = 100, effort = 1)

plotSummary(sim_uncalibrated, short = T)

Oh dear, all of the species but three have collapsed! This is because there was no density dependence (\(R_{max}\) is set at \(Inf\)) and the largest species (cod and saithe) have outcompeted all of the rest.

Step 1. Obtain the time-averaged data (e.g. catches or biomasses for each species) and the time-averaged fishing mortalty inputs (e.g. from stock assessments).

In this section you will:

  • Download fisheries data and process them in a format comparable to the model output

The following .csv are extracted from the ICES database using “data/getICESFishdata_param.R”. Fishing data is averaged over 2014-2019 as it’s a relatively stable period in catches.

# fisheries mortality F
fMat <- read.csv("data/fmat.csv")
fMatWeighted <- read.csv("data/fmatWeighted.csv") # Sandeel and Cod have multiple data base so average their F and weighting by SSB

# read in time-averaged  catches  
catchAvg <-read.csv("data/time-averaged-catches.csv") # only that one is used at the moment | catches are estimated from fMatW

# ssb
ssbAvg <- read.csv("data/time-averaged-SSB.csv")

Step 2. Calibrate the carrying capacity of the background resource spectrum, \(\kappa\), at steady state

In this section you will:

  • guess reasonable \(erepro\) and \(R_{max}\) values which will stop out-competition from a few species

  • vary \(\kappa\) values until you reach coexistence for all species

  • check species’ growth curve, which is influenced by \(\kappa\) to see if it diverged from the Von Bertalanffy curves (data poor case) or growth data (data rich case)

# the fishing mortality rates are already stored in the param object as
params_uncalibrated@species_params$catchability
##  [1] 1.29533333 0.06510547 0.31380000 0.18150000 0.97800000 0.24266667
##  [7] 0.37383333 0.46250569 0.18483333 0.30150000 0.26666749 0.39300000
# let's start again and replace with the initial pre-calibration "guessed" Rmax 
params_guessed <- params_uncalibrated
# penalise the large species with higher density dependence
params_guessed@species_params$R_max <- params_guessed@resource_params$kappa*params_guessed@species_params$w_inf^-1
# and reduce erepro
params_guessed@species_params$erepro <- 1e-3

params_guessed <- setParams(params_guessed)
# run with fishing
sim_guessed <- project(params_guessed, t_max = 100, effort =1)
plotSummary(sim_guessed, short = T)

Here, Sprat’s biomass is orders of magnitude lower than the other species and so are Saithe’s largest individuals, but at least species coexist.

In mizerExperiental, the projectToSteady() function looks for a biomass equilibrium state for a set of intial parameters (RF: is that right?). Let’s see how it behaves with our model:

## compare with Gustav's projectToSteady
params_steady <- projectToSteady(params_guessed, t_max = 100, return_sim = F)
sim_steady <- project(params_steady, t_max = 300, effort =1)
plotSummary(sim_steady, short = T)

# zoom on the size spectrum
plotSpectra(sim_steady,power=2,total = T)

Species are coexisting (but Sprat’s still low). This is in part because we applied a stronger \(R_{max}\) effect for larger species. You can play with the above parameters but but it would take a lot of trial an error to achieve the right combination to get the biomass or catches similar to the observations.

We could explore the effects further using Rshiny app, where we also have a plot of the biomass or catch data. First let’s look at the basic diagnostics and tune \(\kappa\) and \(erepro\) to make sure the feeding levels are high enough for each species and that biomasses coexist.

RF: Rshiny section disabled for pdf

Rshiny makes it easier to fine tune one parameter at a time, but we need to make some species-specific adjustments.

The shiny app helps with understanding the model but it is tricky to arrive at the best fit especially if we want to change several species parameter combinations at a time.

However, varying \(\kappa\) will affect the species’ growth curve, we need to check the modelled growth curves agaisnt the “observed” von Bertlanffy growth curves in case varying \(\kappa\) divert too much the emergent growth curves.

# All emergent and observed growth curves as pannels
plotGrowthCurves(sim_guessed, species_panel = T)

# All emergent growth curves together
plotGrowthCurves(sim_guessed, percentage = T)

# One by one observed vs emergent
plotGrowthCurves(sim_guessed, species = "Cod")

To conclude this section, we are going to choose some values that enable the most species to coexist as a starting point for optimisation. Note we won’t vary \(erepro\) at the same time as \(R_{max}\) (they depend on each other). However we will use the value of \(erepro\) selected from the shiny app.

Step 3. Calibrate the maximum recruitment

In this section you will:

  • use a package that will calibrate \(R_{max}\) per species

\(Rmax\) will affect the relative biomass of each species (and, combined with the fishing parameters, the catches) by minimising the error between observed and estimated catches or biomasses. We could also include \(\kappa\) in our estimation here (as in Blanchard et al 2104 & Spence et al 2016) but instead we will use the value that seemed OK in terms of feeding levels in the Rshiny app, roughly \(log10(11.5)\). Same goes for \(erepro\), a value of \(1e-3\) seemed ok.

First let’s set up a function running the model and outputing the difference between predicted catches (getYield()) and actual catches (catchAvg). err is the sum of squared errors between the two.

# we need 12 Rmaxs, log10 scale
vary <- log10(params_steady@species_params$R_max)
#vary<-runif(10,3,12) # or use completley made up values, same for each species test for effects of initial values

## set up the enviornment to keep the current state of the simulations 
state <- new.env(parent = emptyenv())
state$params <-  params_steady

catchAvg <-read.csv("data/time-averaged-catches.csv") # only that one is used at the moment | catches are estimated from fMatW

## test it
err<-getError(vary = vary, params = params_steady, dat = catchAvg$Catch_1419_tonnes)
## Convergence was achieved in 24 years.
# err<-getError(vary,params,dat=rep(100,12),data_type="biomass")
err
## [1] 85.82931

Now, carry out the optimisation. There are several optimisation methods to choose from - we need to select the most robust one to share here. The R package optimParallel seems to be the most robust general R package and has replaced optim. Often this requires repeateing the proceure several times but the advantage of using parallel run is the speed compared to packages such as optimx.

This might take AWHILE. The output is saved as “optim_para_result” if you wish to skip this block.

library("parallel")
library("optimParallel")
library("tictoc")

# change kappa and erepro based on shiny exploration, set up initial values based on "close to" equilibrium values from above sim
# params_steady already set to erepro = 0.001 and kappa = 10^11

params_optim <- params_guessed
vary <-  log10(params_optim@species_params$R_max)


params_optim@resource_params$kappa<-3.2e11 # better kappa estimated from Rshiny
params_optim<-setParams(params_optim)

noCores <- detectCores() - 1 # keep a spare core

cl <- makeCluster(noCores, setup_timeout = 0.5)
setDefaultCluster(cl = cl)
clusterExport(cl, as.list(ls()))
clusterEvalQ(cl, {
  library(mizerExperimental)
  library(optimParallel)
})

tic()
optim_result <-optimParallel(par=vary,getError,params=params_optim, dat = catchAvg$Catch_1419_tonnes, method   ="L-BFGS-B",lower=c(rep(3,12)),upper= c(rep(15,12)),
                            parallel=list(loginfo=TRUE, forward=TRUE))

stopCluster(cl)
toc() # 80'' using 47 cores
saveRDS(optim_result,"optim_para_result.RDS")
# if previous block not evaluated | have some issue enabling "runtime:shiny" and loading .csv somehow
params_optim <- params_guessed
params_optim@resource_params$kappa<-3.2e11 

optim_result <- readRDS("optim_para_result.RDS")
# optim values:
params_optim@species_params$R_max <- 10^optim_result$par 

# set the param object 
params_optim <-setParams(params_optim)
sim_optim <- project(params_optim, effort = 1, t_max = 100, dt=0.1,initial_n = sim_guessed@n[100,,],initial_n_pp = sim_guessed@n_pp[100,])
saveRDS(sim_optim,"optim_para_sim.RDS")
plotSummary(sim_optim)

Step 4. Check the level of density dependence.

In this section you will:

  • check if the \(RDI/RDD\) ratio and infer consequences on the ecosystem
  plot_dat <- as.data.frame(getRDI(sim_optim@params)/getRDD(sim_optim@params))
  plot_dat$species <- factor(rownames(plot_dat),sim_optim@params@species_params$species)
  colnames(plot_dat)[1] <- "ratio"
  plot_dat$w_inf <- as.numeric(sim_optim@params@species_params$w_inf)
  
  # trying to have bars at their w_inf but on a continuous scale
  plot_dat$label <- plot_dat$species
  plot_dat2 <- plot_dat
  plot_dat2$ratio <- 0
  plot_dat2$label <- NA
  plot_dat <- rbind(plot_dat,plot_dat2)
  
ggplot(plot_dat) +
    geom_line(aes(x = w_inf, y = ratio, color = species), size = 20, alpha = .8) +
    geom_text(aes(x = w_inf, y = ratio, label = label),position = position_stack(vjust = 0.5), angle = 30)+
    scale_color_manual(name = "Species", values = sim_optim@params@linecolour) +
    scale_y_continuous(name = "R0") +
    scale_x_continuous(name = "Asymptotic size (g)", trans = "log10") +
    theme(legend.position = "none") 
## Warning: Removed 12 rows containing missing values (geom_text).

getRDI(sim_optim@params)/getRDD(sim_optim@params)
##      Sprat    Sandeel     N.pout    Herring        Dab    Whiting       Sole 
##   1.000534   2.940145   4.271215   2.510232   6.241042  14.389434  85.672519 
##    Gurnard     Plaice    Haddock        Cod     Saithe 
##  15.051169   1.604304  40.538347 118.077148   2.486730
# seems like there is little density dependence 

# # if needed change erepro & plug back into model
 # params@species_params$erepro[] <-1e-3
 # params <- setParams(params)
 # sim <- project(params, effort = 1, t_max = 500, dt=0.1)
 # plot(sim)

Is the physiological recruitment, \(RDI\), much higher than the realised recruitment, \(RDD\)? High \(RDI/RDD\) ratio indicates strong density dependence.

Step 5. Verify the model after the above step by comparing the model with data.

Eg. species biomass or abundance distrubtions, feeding level, naturality mortality, growth, vulnerablity to fishing (fmsy) and catch, diet composition… Many handy functions for plotting these are available here: https://sizespectrum.org/mizer/reference/index.html

plotSummary(sim_optim)

plotPredObsYield(sim_optim,catchAvg$Catch_1419_tonnes) 

plotDiet2(sim_optim@params,"Cod") 

plotGrowthCurves(sim_optim, species_panel = T)

# interactive plots / won't be displayed in pdf
plotlyBiomass(sim_optim)
plotlySpectra(sim_optim)
plotlySpectra(sim_optim,power=2,total = T)
plotlyGrowthCurves(sim_optim,percentage = T) 
plotlyFeedingLevel(sim_optim) 
plotlyPredMort(sim_optim)
plotlyFMort(sim_optim)
# What would happen if we also parameterised the interaction matrix or beta and sigma?

Now that our model is calibrated, let’s take a look at the \(F_{msy}\)

# need panel plots (per species) of yield at equilibrium (per million) vs effort * catchability

# need catch values / biomass values for different efforts
# sim <- readRDS("~/HowToMizer/optim_para_sim.RDS")

sim <- sim_optim
# plot_datFisheries <-NULL # df that gets biomass/yield/effort/growth
# effortSeq <- seq(0,7,.1) 
# # trade-off -> some species have low catchability so need to run the effort really high to get them to drop in biomass 
# # but at the same time this can make some species go extinct and crash some functions
# for(iEffort in effortSeq)
# {
#   tempSim <- project(sim, effort = iEffort,t_max = 30)
#   #biomass
#   bm <- getBiomassFrame(tempSim)
#   myDat <- filter(bm, Year == max(unique(bm$Year)))
#   #catch
#   yieldDat <- getYield(tempSim)
#   myDat$yield <- yieldDat[dim(yieldDat)[1],]
#   myDat$Year <- NULL
#   myDat$effort <- iEffort
#   #growth
#   growth_dat <- getEGrowth(tempSim@params, n = tempSim@n[dim(tempSim@n)[1],,], n_pp = tempSim@n_pp[dim(tempSim@n_pp)[1],])
#   myDat$growth <- apply(growth_dat,1,sum)
#   plot_datFisheries <- rbind(plot_datFisheries,myDat)
#   #fisheries
#   
# }
# # adjust effort > effort * catchability
# plot_datFisheries$effort <- plot_datFisheries$effort*rep(sim@params@species_params$catchability,length(effortSeq))
# saveRDS(plot_datFisheries, "Fmsy.rds")

plot_datFisheries <- readRDS("Fmsy.rds")

# fishing mortality vs yield
ggplot(plot_datFisheries) +
geom_line(aes(x = effort, y = yield, color = Species))+
  facet_wrap(Species~., scales = "free") +
  scale_x_continuous(limits= c(0,1))+#, limits = c(1e10,NA))+
  scale_y_continuous(trans = "log10") +
  scale_color_manual(name = "Species", values = sim@params@linecolour) +
    theme(legend.position = "bottom", legend.key = element_rect(fill = "white"))

Step 6. The final verification step is to force the model with time-varying fishing mortality to assess whether changes in time series in biomassess and catches capture observed trends. The model will not capture all of the fluctuations from environmental processes ( unless some of these are included), but should match the magnitude and general trend in the data. We explore this in Example # 2 - Changes through time.